IEICE global.ieice.org Site

Keyword Search Result

[Keyword] Hidden Markov Model(71hit)

61-71hit(71hit)

Speech Recognition Based on Fusion of Visual and Auditory Information Using Full-Framse Color Image
Satoru IGAWA Akio OGIHARA Akira SHINTANI Shinobu TAKAMATSU

LETTER

Vol:
E79-A No:11
Page(s):
1836-1840
We propose a method to fuse auditory information and visual information for accurate speech recognition. This method fuses two kinds of information by using Iinear combination after calculating two kinds of probabilities by HMM for each word. In addition, we use full-frame color image as visual information in order to improve the accuracy of the proposed speech recognition system. We have performed experiments comparing the proposed method with the method using either auditory information or visual information, and confirmed the validity of the proposed method.
Tone Recognition of Chinese Dissyllables Using Hidden Markov Models
Xinhui HU Keikichi HIROSE

PAPER

Vol:
E78-D No:6
Page(s):
685-691
A method of tone recognition has been developed for dissyllabic speech of Standard Chinese based on discrete hidden Markov modeling. As for the feature parameters of recognition, combination of macroscopic and microscopic parameters of fundamental frequency contours was shown to give a better result as compared to the isolated use of each parameter. Speaker normalization was realized by introducing an offset to the fundamental frequency. In order to avoid recognition errors due to syllable segmentation, a scheme of concatenated learning was adopted for training hidden Markov models. Based on the observations of fundamental frequency contours of dissyllables, a scheme was introduced to the method, where a contour was represented with a series of three syllabic tone models, two for the first and the second syllables and one for the transition part around the syllabic boundary. Corresponding to the voiceless consonant of the second syllable, fundamental frequency contour of a dissyllable may include a part without fundamental frequencies. This part was linearly interpolated in the current method. To prove the validity of the proposed method, it was compared with other methods, such as representing all of the dissyllabic contours as the concatenation of two models, assigning a special code to the voiceless part, and so on. Tone sandhi was also taken into account by introducing two additional models for the half-third tone and for the first 4th tone of the combination of two 4th tones. With the proposed method, average recognition rate of 96% was achieved for 5 male and 5 female speakers.
Speaker-Consistent Parsing for Speaker-Independent Continuous Speech Recognition
Kouichi YAMAGUCHI Harald SINGER Shoichi MATSUNAGA Shigeki SAGAYAMA

PAPER

Vol:
E78-D No:6
Page(s):
719-724
This paper describes a novel speaker-independent speech recognition method, called speaker-consistent parsing", which is based on an intra-speaker correlation called the speaker-consistency principle. We focus on the fact that a sentence or a string of words is uttered by an individual speaker even in a speaker-independent task. Thus, the proposed method searches through speaker variations in addition to the contents of utterances. As a result of the recognition process, an appropriate standard speaker is selected for speaker adaptation. This new method is experimentally compared with a conventional speaker-independent speech recognition method. Since the speaker-consistency principle best demonstrates its effect with a large number of training and test speakers, a small-scale experiment may not fully exploit this principle. Nevertheless, even the results of our small-scale experiment show that the new method significantly outperforms the conventional method. In addition, this framework's speaker selection mechanism can drastically reduce the likelihood map computation.
Off-Line Handwritten Word Recognition with Explicit Character Juncture Modeling
Wongyu CHO Jin H. KIM

PAPER-Image Processing, Computer Graphics and Pattern Recognition

Vol:
E78-D No:2
Page(s):
143-151
In this paper, a new off-line handwritten word recognition method based on the explicit modeling of character junctures is presented. A handwritten word is regarded as a sequence of characters and junctures of four types. Hence both characters and junctures are explicitly modeled. A handwriting system employing hidden Markov models as the main statistical framework has been developed based on this scheme. An interconnection network of character and ligature models is constructed to model words of indefinite length. This model can ideally describe any form of hamdwritten words including discretely spaced words, pure cursive words, and unconstrained words of mixed styles. Also presented are efficient encoding and decoding schemes suitable for this model. The system has shown encouraging performance with a standard USPS database.
Speech Recognition Using HMM Based on Fusion of Visual and Auditory Information
Akira SHINTANI Akio OGIHARA Yoshikazu YAMAGUCHI Yasuhisa HAYASHI Kunio FUKUNAGA

LETTER

Vol:
E77-A No:11
Page(s):
1875-1878
We propose two methods to fuse auditory information and visual information for accurate sppech recognition. The first method fuses two kinds of information by using linear combination after calculating two kinds of probabilities by HMM for each word. The second method fuses two kinds of information by using the histogram which expresses the correlation of them. We have performed experiments comparing the proposed methods with the conventional method and confirmed the validity of the proposed methods.
Spoken Sentence Recognition Based on HMM-LR with Hybrid Language Modeling
Kenji KITA Tsuyoshi MORIMOTO Kazumi OHKURA Shigeki SAGAYAMA Yaneo YANO

PAPER

Vol:
E77-D No:2
Page(s):
258-265
This paper describes Japanese spoken sentence recognition using hybrid language modeling, which combines the advantages of both syntactic and stochastic language models. As the baseline system, we adopted the HMM-LR speech recognition system, with which we have already achieved good performance for Japanese phrase recognition tasks. Several improvements have been made to this system aimed at handling continuously spoken sentences. The first improvement is HMM training with continuous utterances as well as word utterances. In previous implementations, HMMs were trained with only word utterances. Continuous utterances are included in the HMM training data because coarticulation effects are much stronger in continuous utterances. The second improvement is the development of a sentential grammar for Japanese. The sentential grammar was created by combining inter- and intra-phrase CFG grammars, which were developed separately. The third improvement is the incorporation of stochastic linguistic knowledge, which includes stochastic CFG and a bigram model of production rules. The system was evaluated using continuously spoken sentences from a conference registration task that included approximately 750 words. We attained a sentence accuracy of 83.9% in the speaker-dependent condition.
Speech Recognition of lsolated Digits Using Simultaneous Generative Histogram
Yasuhisa HAYASHI Akio OGIHARA Kunio FUKUNAGA

LETTER

Vol:
E76-A No:12
Page(s):
2052-2054
We propose a recognition method for HMM using a simultaneous generative histogram. Proposed method uses the correlation between two features, which is expressed by a simultaneous generative histogram. Then output probabilities of integrated HMM are conditioned by the codeword of another feature. The proposed method is applied to isolated digit word recognition to confirm its validity.
A Hardware Architecture Design Methodology for Hidden Markov Model Based Recognition Systems Using Parallel Processing
Jun-ichi TAKAHASHI

PAPER-Digital Signal Processing

Vol:
E76-A No:6
Page(s):
990-1000
This paper presents a hardware architecture design methodology for hidden markov model based recognition systems. With the aim of realizing more advanced and user-friendly systems, an effective architecture has been studied not only for decoding, but also learning to make it possible for the system to adapt itself to the user. Considering real-time decoding and the efficient learning procedures, a bi-directional ring array processor is proposed, that can handle various kinds of data and perform a large number of computations efficiently using parallel processing. With the array architecture, HMM sub-algorithms, the forward-backward and Baum-Welch algorithms for learning and the Viterbi algorithm for decoding, can be performed in a highly parallel manner. The indispensable HMM implementation techniques of scaling, smoothing, and estimation for multiple observations can be also carried out in the array without disturbing the regularity of parallel processing. Based on the array processor, we propose the configuration of a system that can realize all HMM processes including vector quantization. This paper also describes that a high PE utilization efficiency of about 70% to 90% can be achieved for a practical left-to-right type HMMs.
Three Different LR Parsing Algorithms for Phoneme-Context-Dependent HMM-Based Continuous Speech Recognition
Akito NAGAI Shigeki SAGAYAMA Kenji KITA Hideaki KIKUCHI

PAPER

Vol:
E76-D No:1
Page(s):
29-37
This paper discusses three approaches for combining an efficient LR parser and phoneme-context-dependent HMMs and compares them through continuous speech recognition experiments. In continuous speech recognition, phoneme-context-dependent allophonic models are considered very helpful for enhancing the recognition accuracy. They precisely represent allophonic variations caused by the difference in phoneme-contexts. With grammatical constraints based on a context free grammar (CFG), a generalized LR parser is one of the most efficient parsing algorithms for speech recognition. Therefore, the combination of allophonic models and a generalized LR parser is a powerful scheme enabling accurate and efficient speech recognition. In this paper, three phoneme-context-dependent LR parsing algorithms are proposed, which make it possible to drive allophonic HMMs. The algorithms are outlined as follows: (1) Algorithm for predicting the phonemic context dynamically in the LR parser using a phoneme-context-independent LR table. (2) Algorithm for converting an LR table into a phoneme-context-dependent LR table. (3) Algorithm for converting a CFG into a phoneme-context-dependent CFG. This paper also includes discussion of the results of recognition experiments, and a comparison of performance and efficiency of these three algorithms.
An SVQ-HMM Training Method Using Simultaneous Generative Histogram
Yasuhisa HAYASHI Satoshi KONDO Nobuyuki TAKASU Akio OGIHARA Shojiro YONEDA

LETTER

Vol:
E75-A No:7
Page(s):
905-907
This study proposes a new training method for hidden Markov model with separate vector quantization (SVQ-HMM) in speech recognition. The proposed method uses the correlation of two different kinds of features: cepstrum and delta-cepstrum. The correlation is used to decrease the number of reestimation for two features thus the total computation time for training models decreases. The proposed method is applied to Japanese language isolated dgit recognition.
Neural Networks Applied to Speech Recognition
Hiroaki SAKOE

INVITED PAPER

Vol:
E75-A No:5
Page(s):
546-551
Applications of neural networks are prevailing in speech recognition research. In this paper, first, suitable role of neural network (mainly back-propagation based multi-layer type) in speech recognition, is discussed. Considering that speech is a long, variable length, structured pattern, a direction, in which neural network is used in cooperation with existing structural analysis frameworks, is recommended. Activities are surveyed, including those intended to cooperatively merge neural networks into dynamic programming based structural analysis framework. It is observed that considerable efforts have been paid to suppress the high nonlinearity of network output. As far as surveyed, no experiment in real field has been reported.